Android: From Reversing to Decompilation
نویسندگان
چکیده
This talk deals with Android’s bytecode analysis. The Android system is now widespread, and lots of applications are developed each days. These applications are mostly written in Java, though it is possible to do calls to binaries or shared libraries. To be executed on the DVM the Java source code is translated into Java bytecode (.class files) and then a tool named ‘dx’ is used to convert it into the Dalvik format (.dex files). Such a conversion is needed as the DVM is a register-based machine whereas the JVM is a stack-based one, and as such they have different opcodes. Due to the nature of the bytecode, its reversing is somewhat easier than machine code. Indeed, unlike machine code, (Dalvik) bytecode contains semantic information that allows us to do a better analysis. We can get useful details on variables, fields, methods. . . We can create signatures for a method, or we can use the android permissions to see where a specific one is used in an application. The analysis part allows us to extract the control flow graph (which is composed of basic blocks, and which cannot be modified dynamically due to the virtual machine) which is used to represent the different possibles paths of an application. Furthermore, we have implemented new algorithms to calculate the similarity distance between two applications, a useful information to know if your application has been stolen from the android market. It’s also possible to use similarity to do ‘diffing’ of Android applications is useful to see patches of bugs or insertion of evil code, this is why we have developed a combination of techniques to quickly see the differences between two applications. Moreover it’s interesting to have the ability to manipulate in a simple way all these new formats (APK, DEX, Dalvik bytecode, Android’s binary xml) to automate testing directly in a program or in a specific interpreter. There are some ways to retrieve the Java source code of an application from the bytecode, for instance some people use a software which transforms Dex bytecode into Java bytecode and then combine it with a regular Java decompiler. But the resulting code often looks like an obfuscated version which does not compile than real source code. That’s why we developed a new decompiler which uses only Dalvik bytecode to create an original Java source code. We present a new open-source tool (Androguard) written in Python (and some parts of C language) which help the reversing of Android applications, as well as our decompiler.
منابع مشابه
C Decompilation: Is It Possible?
Decompilation is reconstruction of a program in a high-level language from a program in a low-level language. Possibility and feasibility of decompilation is a subject of controversy over last years. We present several arguments supporting the idea that in spite of impossibility of full automatic decompilation there exist methods and techniques that cover most of decompilation process for wide ...
متن کاملComparing Type-Based and Proof-Directed Decompilation
In the past couple of years interest in decompilation has widened from its initial concentration on reconstruction of control flow into well-founded-in-theory methods to reconstruct type information. Mycroft described Type-Based Decompilation and Katsumata and Ohori described ProofDirected Decompilation. This note summarises the two approaches and identifies their commonality, strengths and wea...
متن کاملDecompilation of Java bytecode to Prolog by partial evaluation
Reasoning about Java bytecode (JBC) is complicated due to its unstructured control-flow, the use of three-address code combined with the use of an operand stack, etc. Therefore, many static analyzers and model checkers for JBC first convert the code into a higher-level representation. In contrast to traditional decompilation, such representation is often not Java source, but rather some interme...
متن کاملSecuring Mobile Applications
W idespread mobile device use has stimulated a rich market for applications. Many apps, however, reveal sensitive user information such as location, movements, and habits1 and/or spread malware.2 Network anonymization techniques alone don’t ensure privacy because the OS together with the invoked mobile apps might still release information that reidentifies users or devices. Even when users are ...
متن کاملHigh-Level Composite Type Reconstruction During Decompilation from Assembly Programs
This paper presents a method for automatic reconstruction of high-level composite types during decompilation of C programs from assembly code. The proposed method is based on expressing memory access operations as pairs (base+offset), then building sets of equivalence for all memory access bases used in the program and accumulating sets of offsets for all classes of equivalent bases. Experiment...
متن کامل